Small Sample Properties of Nonparametric Bootstrap t Confidence Intervals
نویسندگان
چکیده
Confidence interval construction for central tendency is a problem of practical consequence for those who must analyze air contaminant data. Determination of compliance with relevant ambient air quality criteria and assessment of associated health risks depend upon quantifying the uncertainty of estimated mean pollutant concentrations. The bootstrap is a resampling technique that has been steadily gaining popularity and acceptance during the past several years. A potentially powerful application of the bootstrap is the construction of confidence intervals for any parameter of any underlying distribution. Properties of bootstrap confidence intervals were determined for samples generated from lognormal, gamma, and Weibull distributions. Bootstrap t intervals, while having smaller coverage errors than Student's t or other bootstrap methods, under-cover for small samples from skewed distributions. Therefore, we caution against using the bootstrap to construct confidence intervals for the mean without first considering the effects of sample size and skew. When sample sizes are small, one might consider using the median as an estimate of central tendency. Confidence intervals for the median are easy to construct and do not under-cover. Data collected by the Northeast States for Coordinated Air Use Management (NESCAUM) are used to illustrate application of the methods discussed. INTRODUCTION The bootstrap is a resampling technique1 that has been steadily gaining popularity and acceptance during the past several years. A potentially powerful application of the bootstrap is the construction of confidence intervals for any parameter of any underlying distribution.2 In this paper we address the use of the bootstrap technique to construct confidence intervals for the mean of an unknown distribution using small sample sizes. Confidence interval construction for central tendency is a problem of practical consequence for those who must analyze environmental monitoring data. For example, determination of compliance with relevant ambient air quality criteria and assessment of associated health risks depend upon quantifying the uncertainty of estimated mean pollutant concentrations. In this paper we focus on distribution-free techniques and small sample sizes. Confidence intervals for the mean of an unknown distribution can be based on Student’s T, provided the sample size is large. When the underlying distribution is not normal and sample sizes are small, Student's t will under-cover, meaning the probability that IMPLICATIONS Determination of compliance with relevant ambient air quality criteria and assessment of associated health risks depend upon quantifying the uncertainty of estimated mean pollutant concentrations. The nonparametric bootstrap t has become a popular method for constructing confidence intervals for the mean of an unknown distribution. However, users should be aware that this method does not provide nominal coverage probabilities when used with small samples from probability distributions typically used to characterize pollutant concentrations. It is suggested that using the sample median and associated confidence interval is a more reliable estimator of central tendency for this application. Porter, Rao, Ku, Poirot, and Dakins 1198 Journal of the Air & Waste Management Association Volume 47 November 1997 the interval contains the parameter of interest is less than nominal. A nominal 95% confidence interval based on small samples from a skewed distribution and Student's t statistic will contain the mean less than 95% of the time in repeated sampling. This paper provides some guidance about adequate sample sizes for use with Student's t. Bootstrap methods offer the promise of confidence intervals having coverage close to nominal. However, one must choose from among several bootstrap confidence interval methods that have appeared in the literature. Perhaps the most well known bootstrap confidence interval methods are the percentile method (BSP), the bootstrap t (BST), and the biascorrected and accelerated bootstrap (BCA). Large sample theory has shown the BST to provide narrower intervals, with smaller coverage errors than the BSP or BCA methods.3 The BST is particularly applicable to location statistics, such as the sample mean, where there is an appropriate estimator of its standard error.4 When there is no obvious standard error formula for the parameter of interest, as is the case for the sample median, one may resort to the BCA or double bootstrapping.4 Large sample theory assists the choice of a method, but does not always provide a reliable indication of performance with small samples. In this paper, we use simulation studies and data collected by the Northeast States for Coordinated Air Use Management (NESCAUM) to illustrate the small sample properties of the BST. Although these networks have a primary focus on visibility impairment, the resulting trace element data also provide an excellent opportunity for assessing long-term exposures to a variety of potentially toxic trace metals and other substances.5 BST intervals, while having smaller coverage errors than Student's t or other bootstrap methods, under-cover for small samples from skewed distributions. Therefore, we caution against using the BST to construct confidence intervals for the mean without first considering the effects of sample size and skew. When sample sizes are small, one might consider using the sample median as an estimate of central tendency. Confidence intervals for the median are easy to construct and do not under-cover. In addition, median confidence intervals can be constructed for highly censored samples. As sample sizes increase, the actual coverage of the BST approaches nominal more rapidly than Student's t, and, if the underlying distribution can be identified, better methods, including a parametric bootstrap, can be applied. METHODS Database for Examples Data collected by NESCAUM were used to illustrate the methods. The NESCAUM Regional Particle Monitoring Network samples fine particulate matter (< 2.5 microns) at seven air monitoring stations in New York, New Jersey, Connecticut, Rhode Island, Maine, Massachusetts, New Hampshire, and Vermont. Twenty-four hour composite samples are collected on teflon filters every Wednesday, Saturday, and every sixth day that is not a Wednesday or Saturday, for a total of about 145 samples annually. Samples are analyzed by Crocker Nuclear Laboratory at the University of California at Davis for mass (gravimetric), light absorption (integrating plate), and multiple trace elements (proton elastic scattering analysis and proton-induced X-ray emission).6-8 Zinc concentrations at Ringwood, NJ were chosen to illustrate the methods discussed in this paper. Annual mean and day-of-the-week concentrations are useful for assessing human health risk associated with long-term exposure to toxic air contaminants. The zinc concentrations found at these locations are well below the 24-hour-maximum standard of 0.15 ng/m3 for particulate zinc. However, arsenic concentrations detected in the network, while often well above the standard, are difficult to characterize statistically or track because of censoring. In addition, large amounts of zinc and arsenic or other toxic substances may originate from the same source. Hence, there is interest in using zinc as a surrogate for particulate arsenic. Confidence Intervals Student's t. A 1-2α confidence interval for the mean using Student's t is given by:
منابع مشابه
Statistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...
متن کاملSmall Sample Bootstrap Confidence Intervals for Long-Memory Parameter
The log periodogram regression is widely used in empirical applications because of its simplicity, since only a least squares regression is required to estimate the memory parameter, d, its good asymptotic properties and its robustness to misspecification of the short term behavior of the series. However, the asymptotic distribution is a poor approximation of the (unknown) finite sample distrib...
متن کاملStabilizing bootstrap-t confidence intervals for small samples
A major use of the bootstrap methodology is in the construction of nonparametric confidence intervals. Although no consensus has yet been reached on the best way to proceed, theoretical and empirical evidence indicate that bootstrap-t intervals provide a reasonable solution to this problem. However, when applied to small data sets, these intervals can be unusually wide and unstable. The author ...
متن کاملLocal Bootstrap Approach for the Estimation of the Memory Parameter
The log periodogram regression is widely used in empirical applications because of its simplicity to estimate the memory parameter, d, its good asymptotic properties and its robustness to misspecification of the short term behavior of the series. However, the asymptotic distribution is a poor approximation of the (unknown) finite sample distribution if the sample size is small. Here the finite ...
متن کاملThe comparison of parametric and nonparametric bootstrap methods for reference interval computation in small sample size groups
According to the IFCC, to determine the population-based reference interval (RI) of a test, 120 reference individuals are required. However, for some age groups such as newborns and preterm babies, it is difficult to obtain enough reference individuals. In this study, we consider both parametric and nonparametric bootstrap methods for estimating RIs and the associated confidence intervals (CIs)...
متن کامل